Binary Patterning, Synthetic Biology, & The Hecht Lab at Princeton University



Modern proteomes found in nature have been molded by billions of years of evolution to most suitably account for biological and environmental selective factors. While these evolutionary forces have created modern proteins with exquisite specificity and functionality the majority of sequence space was likely never examined. This leaves open the possibility that other protein structures never examined due to evolution are more efficacious than their current modern analogues. Unfortunately, these potentially efficacious versions are likely never to be examined in nature as evolution went on an irrevocable path long ago toward modern proteomes. A modern proteome unbiased by evolutionary forces would be an extremely useful tool to examine sequence space unexplored by evolution. Out of such exploration, a better understanding of the true potential of sequence space could arise and provide insight into the "logic" of evolutionary the forces that created modern proteomes.

It has been proposed that primitive enzymes were promiscuous, possessing a broad range of specificities for substrates. This would have allowed early organisms to perform the plethora of catalytic functions necessary for survival with a smaller repertoire of proteins, albeit at a slower rate than modern proteomes. This hypothesis put forward by Jensen et al. in 1976 is nearly impossible to test as evolution has already biased every protein found in nature. In order to test Jensen’s hypothesis as well as begin to explore the vast expanse of unexamined sequence space overlooked by evolution, researches like Prof. Michael Hecht et al. have developed libraries of de novo proteins, unbiased by evolutionary artifacts.

Binary Patterning
An ideal unbiased library would be created via stochastic combinatorial methods. Since a majority of sequence space does not fold into protein-like tertiary structures often resulting in aggregation, purely stochastic methods do not work. Instead, methods were developed to direct sequences toward those that assume stable tertiary structures. One such method is called "binary patterning," and its usage to develop vast, unbiased protein libraries is a primary focus of the Hecht Lab at Princeton University. Binary patterning is guided by two fundamental themes: 1. Natural proteins structures are predominantly comprised of secondary structure and 2. polar side chains are typically exposed to the surface while hydrophobic residues are buried. This method relies on the premise that the exact residue identity is less important than the overall location of polar and non-polar residues. Sequences that comply with these rules can be designed by constraining the periodicity of polar and non-polar residues to match the typical periodicity of the desired secondary structure. In the case of an alpha-helical design, the designed periodicity must have a non polar amino acid at every 3rd or 4th position to approximate 3.6 residues/turn found in alpha-helices. By following this design, the helix will have a polar and non-polar face (See Diagram). Through this method, a combinatorial library can be created that is directed toward sequence space that contains a higher proportion of folded structures, while still maintaining diversity by not specifying specific residue identity, only polarity.

The First Generation Library
The Hecht Lab developed the first generation library using a degenerate DNA codon system to create a 74-residue template designed to form a 4-helix bundle structure. While searches of the first generation library using NMR identified several proteins that exhibited native like structures, most proteins within the first generation formed fluctuating molten globule structures.

The Second Generation Library
It was predicted that the predominant reason the first generation library did not form well folded proteins, as would be predicted by numerous other studies, was that the helices were not long enough. Longer helices give rise to larger interhelical interfaces which add stability due to additional Van der Waals forces. Hecht et al. used a molten globule-like protein from the first generation, Sequence #86, as a template for the second generation to demonstrate that the additional features (7 additional residues per helix) were sufficient to stabilize a molten structure into a well-folded one. An NMR solved structure of a second generation library protein revealed that the binary method is a viable system for creating stable, diverse proteins.

The solved structure of S-824 revealed that the protein is a four-helix bundle with a non-polar interior and polar exterior, as per design. Since 86% (88 of 102) of the residues in this protein were not specified, the library contains significant diversity. The 14 residues that were specifically selected are in the chain termini and interhelical turns. Over the entire library, a significant percentage (~ 80%) are well ordered, validating the binary method. A second well folded structure was solved, S-836, in 2008, supporting the conclusion that S-824 was a reasonable representative of the library as a whole.

Structure of S-836
S-836 is a left-turning 4-helix bundle. Since the identity of residues was not specified beyond polarity restrictions, a diversity of hydrophobic packing is expected. In S-836, all polar residues are exposed to solvent while most non-polar residues are buried. That being said, three of the four methionine side chains are exposed to solvent likely due to their proximity with large hydrophobic residues nearby (ie. Residues Trp23, Phe47, Phe64, and Phe93). The hydrophobic core consists of five stacked packing layers. It is believed that the bulky tryptophan side chain at position 23 is responsible for sealing off the hydrophobic core and network of cavities within the core from the solvent. This assertion is supported by the fact that all well-folded proteins characterized had a bulky tryptophan side chain at this position while molten globule library members did not. . While stable, S-836 has several residues that are dynamic, including residues 14, 23, 24, 34, 71, 74, 75, 76, 77, 87, and 100. Although internal cavities leading to dynamic conformational mobility is not ideal for evolved, precise proteins, these properties are advantageous for primordial proteins, providing the necessary catalytic versatility for early organisms functioning with a minimal enzyme repertoire. In fact, several of the proteins from the second generation screened exhibited promiscuous enzymatic activity including esterase, lipase and peroxonase activities when bound to heme.

The Third Generation Library
Since the second generation library was small and of limited diversity due to its single starting point (Sequence #86), a third generation library was created. This library specified the identity of 34 residues near the turns between helices, but did not specify the identity of the majority of turn residues nor the majority of residues involved in interhelical packing. The third generation library was considerably more diverse, containing ~ 10^6 sequences. Heme binding assays determined that nearly 66% of the third generation library binds heme (nearly 100% for those those proteins which express well) with over half of these binding heme at high levels. The catalytic ability of library enzymes was verified both in the presence and absence of heme. Nearly 80% of those proteins which bound heme exhibited peroxidase activity (up to 10^6 fold faster than the uncatalyzed reaction), ~ 60% exhibited hydrolase activity (~10^3 fold faster than the uncatalyzed reaction), and 36% exhibited lipase activity (up to 10^3 fold faster than the uncatalyzed reaction). . Also of note, nearly 30% of those proteins which bound heme exhibited some level of activity for all the functions, highlighting the promiscuity of unevolved libraries. Even in the absence of cofactor, 30% of the third generation library exhibits esterase activity and 20% exhibits lipase catalytic activity, although at considerably lower rates than natural, evolved enzymes.

Conclusion
Using an unevolved combinatorial library created with a binary design Hecht et al. created a library of primitive enzymes, untouched by evolution. The enzymes in this library form stable folded structures, but contain several dynamic residues and promiscuous substrate specificity. These results appear to validate Jensen’s hypothesis from 1976 that "primitive enzymes possessed a very broad specificity, permitting them to react with a wide range of related substrates." The eventual development of cells adapted to a particular environment and the differentiation of cells with specific functions in higher organisms required that proteins shed their promiscuous specificities. Although modern proteomes are well adapted to the functions they carry out, the vastness of the unexplored sequence space likely holds structures which would be more ideal. Hecht et al continue their research on binary combinatorial libraries, especially toward finding biologically active proteins which are able to support cellular life in-vivo, a critical step toward synthetic biology.

Additional Structures
1p68 – S-824

2jua – S-836

Additional Resources
The Hecht Lab Website